02A pairs experiments

Read the pairwise competitive outcomes determined by colony counting on TSA

The script below generates the table for manual keyin

if (FALSE) source("script/02A-pairs_experiment-01-pairwise_manual_key_in.R")

Then I manually checked the scanned plate images of competitive pairs. The plate images are saved in folder data/raw/plate_scan/emergent_coexistence_plate_scan/ divided according to the experimental batches.

Batch Community
B2 2.6, 2.8, 7.1, 8.4, 10.2, 11.1
C 11.1 isolate 1
C2 11.2
D 1.2, 1.4, 1.6, 1.7, 4.1, 11.5
# Read result colony counts and dilution factor
if (run_scripts) source("script/02A-pairs_experiment-02-colony_count.R")

There are 186x3=558 outcomes of pairwise competitions.

pairs_competition <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_competition.csv", col_types = cols())
pairs_competition

67 of of 558 pair-frequency without determined outcomes of pairwise competitions will be determined by using CASEU method

pairs_competition %>% filter(is.na(ColonyCount))

186 pair IDs saved in data/temp/pairs_ID.csv

pairs_ID <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ID.csv", col_types = cols())
pairs_ID

Ambiguous pairs and isolates on TSA agar plates

if (run_scripts) source("script/02A-pairs_experiment-03-pairwise_ambiguous.R")

There are 67 pair-freqs that the competition outcome cannot be determined by TSA plate counting because of ambiguous morphology. The ambiguous pairs will be later examined by using selective media or Sanger sequencing.

pairs_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ambiguous.csv", col_types = cols())
pairs_ambiguous

In total, 28 pairs

pairs_ambiguous %>% 
  group_by(Community, Isolate1, Isolate2) %>%
  summarize(Count = n(), .groups = "keep")

36 isolates involved in the ambiguous pairs.

isolates_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_ambiguous.csv", col_types = cols())
isolates_ambiguous

Map pairs and isolates to the DW96 plate layout

if (run_scripts) source("script/02A-pairs_experiment-04-pairwise_plate_layout.R")

Each plate has 96 rows and has the following variables

plates <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates.csv", col_types = cols())
plates

Plates for across-community and random assembly

plates_random <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates_random.csv", col_types = cols())
plates_random
plates_list <- plates %>%
    filter(Plate == "P1") %>%
    distinct(Batch, PlateLayout) %>%
    rename(batch = Batch, platelayout = PlateLayout) %>%
    rowwise() %>%
    mutate(plate = filter(plates, Batch == batch, PlateLayout == platelayout, Plate == "P1") %>% list) %>%
    mutate(plate = prepare_plate_draw(plate) %>% list) %>%
    mutate(p_plate = draw_plate_from_df(plate) %>% list) 

plot_grid(plotlist = plates_list$p_plate, labels = paste0(plates_list$batch, " ", plates_list$platelayout),
          ncol = 2)
Warning: Removed 2 rows containing missing values (geom_text).

Plate layout that takes different initial frequencies:

  • P1 is 50%:50%.

  • P2 and P3 are identical and whose rows are 95% and columns are 5%.

The only exception is plate C11R1 in batch C, which only has one plate.

OD <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/OD.csv", col_types = cols())

OD %>%
    filter(Community == "C1R2", Wavelength == 620) %>%
    ggplot(aes(x = Transfer, y = Abs, color = Isolate1Freq)) +
    geom_line() + geom_point() +
    facet_grid(Isolate1 ~ Isolate2) +
    theme_classic()

NA

02B CASEU sanger sequencing

list.files(here::here("output/protocol"), pattern = "caseu")
devtools::install_bitbucket('dattamanoshi/caseu') # Install CASEU package
library(CASEU)

Test on example data

if (run_scripts) source("script/02B-CASEU_sanger_seq-00-test.R")

CASEU pilot1

The plate layout of PCR plate and list of samples are specified in output/protocol/protocol_20190813_Sanger_seq_prep.pdf.

read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot1/protocol_20190813_Sanger_seq_prep-genewiz_table.csv", col_types = cols())

The isolates used in this round is from the list below.

isolates
Isolate Code Taxa
C11R1 isolate 1 A Pseudomonas
C1R7 isolate 2 B Pseudomonas
C1R7 isolate 1 C Enterobacter
C1R7 isolate 7 D Raoultella

Read trace matrices for isolates and mixtures

if (run_scripts) source("script/02B-CASEU_sanger_seq-01-pilot1.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot1.csv", col_types = cols())

CASEU pilot2

The plate layout of PCR plate and list of samples are specified in output/protocol/protocol_20190813_Sanger_seq_prep.pdf. There are 32 samples from pairwise competition and 16 samples from control. Also read Sylvie’s data.

read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/protocol_20190910_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())

Isolates A B C D in control are the isolates below.

Isolate Code Taxa
C11R1 isolate 1 A Pseudomonas
C1R7 isolate 2 B Pseudomonas
C1R7 isolate 1 C Enterobacter
C1R7 isolate 7 D Raoultella
if (run_scripts) source("script/02B-CASEU_sanger_seq-02-pilot2.R")
#source("script/02B-CASEU_sanger_seq-02a-pilot2_Sylvie.R") # Run Sylvie's sequence

In the 12 control synthetic pairs that were made of 4 isolates, compare these pairs’ OD frequencies, colony counts, and CASEU predictions.

Read CASEU predicted frequencies and colony count frequencies in the 12 control pairs.

CASEU_pilot2 <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot2.csv", col_types = cols())
CASEU_pilot2_plating <- read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/CASEU_pilot2_plating.csv", col_types = cols()) %>%
  mutate(Isolate1ColonyFreq = Isolate1Colony / (Isolate1Colony + Isolate2Colony),
    Isolate2ColonyFreq = Isolate2Colony / (Isolate1Colony + Isolate2Colony))
CASEU_pilot2

CASEU pilot3

The plate layout of PCR plate and list of samples are specified in output/protocol/protocol_20190923_SequalPrep_Sanger_prep.pdf.

read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot3/protocol_20190924_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())

Read trace matrices for isolates and mixtures

# It takes about ~15 mins to run 
if (run_scripts) source("script/02B-CASEU_sanger_seq-03-pilot3.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot3.csv", col_types = cols())

CASEU pilot4

The plate layout of PCR plate and list of samples are specified in output/protocol/protocol_20191007_Sanger_seq_prep.pdf.

read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot4/protocol_20191007_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())

Read trace matrices for isolates and mixtures

# It takes about ~15 mins to run
if (run_scripts) source("script/02B-CASEU_sanger_seq-04-pilot4.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot4.csv", col_types = cols())

CASEU six plates

  • Genewiz submission code: CASEU_RN_5 (which is not correct because these are not RN plates)
  • Plates:
    • B2 T7 933 P2
    • B2 T7 444 P2
    • C2 T7 13A P2
    • C2 T7 13B P2
    • D T7 75 P2
    • D T7 5543 P2
  • Notes: the plates must be placed in order!!
# Generate sequencing csv
if (FALSE) {
genewiz_table <- tibble(Sample = 1:(96*6), 
       `DNA Name` = paste0(rep(c("B2_T7_933_P2", "B2_T7_444_P2", "C2_T7_13A_P2", "C2_T7_13B_P2",  "D_T7_75_P2", "D_T7_5543_P2"), each = 96), "_", rep(1:96, 6)), 
       `Length (bp)` = 1000, `Concentration (ng/uL)` = 0.83, Primer = "27F")

write_csv(genewiz_table, here::here("output/protocol/tab_fig/20211202_Sanger_seq_prep-genewiz_table_CYC.csv"))

}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_sixplates.csv", col_types = cols())

Random network, CASEU batch 1

  • Genewiz submisstion code: Caseu_RN_1
  • Plate layout: T3 C P2
  • Description: samples 3, 4, 5, 12, 13, 27, 29, 33, 73, 85, 86, 91, and 94 (order by column in the plate layout) arrived Genewiz dry, so they are not processed.
# If will take ~2 hr to run
if (run_scripts) source("script/02B-random_network_CASEU-01-batch1.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN1.csv", col_types = cols())

Random network, CASEU batch 2

  • Genewiz submission code: Caseu_RN_2
  • Plate layout: T3 C P2
  • Description: this plate layout contains the isolates for all 4 random assembled communities
# It takes ~30 mins
if (run_scripts) source("script/02B-random_network_CASEU-02-batch2.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN2.csv", col_types = cols())

Random network, CASEU batch 3

  • Genewiz submisstion code: Caseu_RN_3
  • Plate layout:
    • P1 (T0 C P2), tracking number 30-333649143
    • P2 (T0 AD P2), tracking number 30-333649365
    • P3 (T3 AD P2), tracking number 30-333649517
  • Description: AD plates have the isolates assembled from the self-assembled communities but different communities
# It takes ~1.5 hr
if (run_scripts) source("script/02B-random_network_CASEU-03-batch3.R")

Use T0 OD frequencies

read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN3.csv", col_types = cols())

Random network, CASEU batch 4

  • Genewiz submisstion code: Caseu_RN_4
  • Plate layout: T3 BD P3
  • Description: This plate has the full AcrAss2 (B) and half of RanAss2 (D) network.
if (run_scripts) source("script/02B-random_network_CASEU-04-batch4.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN4.csv", col_types = cols())

02C pairs_OD_CFU

General formula of error propagation

This section explains the error propagation theory to estimate the uncertainty in the experimental measurement. There are one or more quantities \(x, y, ...\), with corresponding uncertainties \(\delta x, \delta y, ...\) and that we wish to use the measured values of x and y to calculate the quantity of real interest q. There are three provisional rules:

  1. The square-root rule for counting experiments. The average number of event in time T is \(v\pm\sqrt{v}\)

  2. Uncertainty in sums and differences. The computed mean value \(q=x+y+z-(u+w)\). If the uncertainties in x, …, w are known to be independent and random, then the uncertainty in the computed value of q is the quadratic sum \(\delta q = \sqrt{(\delta x)^2+(\delta y)^2+(\delta z)^2+(\delta u)^2+(\delta w)^2}\) , In any case, \(\delta q\) is never larger than their ordinary sum \(\delta q \leqslant \delta x + \delta y+ \delta z + \delta u + \delta w\)

  3. Uncertainty in product and quotients. \(q=\frac{x\times z}{u\times w}\), then the fractional uncertainty in the computed value q is the sum. If the uncertainties in x, …,w are independent and random, then the fractional uncertainty in q is the sum in quadrature of the original uncertainties, \(\frac{\delta q}{\left|q\right|} = \sqrt{(\frac{\delta x}{\left|x\right|})^2 + \frac{\delta y}{\left|y\right|})^2 +\frac{\delta z}{\left|z\right|})^2 +\frac{\delta u}{\left|u\right|})^2 + \frac{\delta w}{\left|w\right|})^2}\). \(\frac{\delta q}{\left| q \right|} \leqslant \frac{\delta x}{\left| x \right|} + \frac{\delta z}{\left| z \right|} + \frac{\delta u}{\left| u \right|} + \frac{\delta w}{\left| w \right|}\)

Taking these three provisional rules together, the general formula for error propagation takes the following form. Assume the computed quantity is a function of \(x_1, x_2, ..., x_n\), the uncertainty in q is \[\delta q= \sqrt{(\sum{\frac{\partial q}{\partial x_i}}\delta x_i)^2}\]

Uncertainty in epsilon

if (run_scripts) source("script/02C-pairs_OD_CFU-01-epsilon_uncertainty.R")

The uncertainties in each isolate’ epsilon \(\epsilon_A = \frac{OD_A DF_A v_A}{CFU_A}\) comes from four parts:

  1. \(DF\): Dilution factors. The uncertainty comes from the pipetting in serial dilution, which is calcaulted below.
  2. \(v\): Plating volumes. The systematic error for P20 set at 20 uL is 0.4 uL.
  3. \(CFU\): CFU counts. The uncertainty in CFU follows poisson distribution, which means that the variance is the same as the mean (measured CFU). The standard deviation is \(\sqrt{CFU}\).
  4. \(OD\): OD measurement. The uncertainty in measuring OD in plate reader, which is assumed to be 0.001.

Dilution factor

The uncertainty in dilution factor comes from the pipetting steps in serial dilution, which include two pipetting volumes:

  1. V1: serial dispensing 10 uL of diluted solution using mP20. It has uncertainty ErrorV1 2 uL.
  2. V2: dispense 90 uL of PBS using mP200. It has uncertainty ErrorV2 0.4 uL.

For dilution factor \(10^{n}\), it has the the mean \((\frac{V_1}{V_1+V_2})^n\). To obtain the uncertainty in the measured mean, first we caluculate the partial derivatives \(\frac{\partial DF}{\partial V_1}\) and \(\frac{\partial DF}{\partial V_2}\). Then the uncertainty \(\delta DF = \sqrt{(\frac{\partial DF}{\partial V_1}\delta V_1)^2 + (\frac{\partial DF}{\partial V_2}\delta V_2)^2}\)

dilution_factor <- tibble(n = c(4, 5), V1 = 10, V2 = 90, ErrorV1 = 0.4, ErrorV2 = 2) %>%
  mutate(PartialV1 = n * V1^(n-1) * V2 / (V1+V2)^(n+1), #-n*(V1+V2)^(n-1)*V2/(V1^(n+1)),
         PartialV2 = -n * V1^(n-1) / (V1+V2)^(n+1), #n*(V1+V2)^(n-1)/(V1^n),
         DF = (V1/(V1+V2))^n,
         ErrorDF = sqrt((PartialV1*ErrorV1)^2 + (PartialV2*ErrorV2)^2))

dilution_factor

By using error propagation theroy, the uncertainty in isolates epsilon has the form

\[\delta \epsilon = \sqrt{(\frac{\partial \epsilon}{\partial DF}\delta DF)^2 +(\frac{\partial \epsilon}{\partial CFU}\delta CFU)^2 + (\frac{\partial \epsilon}{\partial OD}\delta OD)^2 + (\frac{\partial \epsilon}{\partial V}\delta V)^2}\]

isolates_epsilon_uncertainty <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_epsilon_uncertainty.csv", col_types = cols())
isolates_epsilon_uncertainty
NA

There are 7 isolates that have either 0 CFU or negative OD value, so they don’t have epsilon values.

isolates_epsilon_uncertainty %>%
  filter(is.na(Epsilon))

Convert T0 OD frequencies to CFU frequencies

if (run_scripts) source("script/02C-pairs_OD_CFU-02-CFU_frequency.R")

The outcome of pairwise competition were determined by comparing the frequency changes between T0 and T8. The T8 frequencies were determined by plating the mature media on rich agar media on which the colonies were counted, whereas T0 frequencies were set to 95/5, 50/50 and 5/95 by mixing two isolate inocula with equal OD. In this section, I will use the OD-CFU conversion coefficient \(\epsilon\) derived from T8 isolate data to convert T0 OD frequencies into CFU frequencies. In specific, the CFU frequency of isolate 1 \(f^C_1\) of a pair can be derived from OD frequencies of isolate 1 and 2 \(f^O_1 ,f^O_2\), which have the relationship below.

\(f^C_1 = \frac{f^o_1\epsilon_1DFv}{f^o_1\epsilon_1DFv + f^o_2\epsilon_2DFv}=\frac{f^o_1\epsilon_1}{f^o_1\epsilon_1 + f^o_2\epsilon_2}\)

where \(\epsilon\) of each isolate was pre-calculated from T8 dataset. DF and v are the same in conversion.

read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq.csv", col_types = cols())

Uncertainty in T0 CFU frequencies

if (run_scripts) source("script/02C-pairs_OD_CFU-03-CFU_frequency_uncertainty.R")

Write CFU frequencies to data/temp/pairs_CFU_freq_uncertainty.csv

read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq_uncertainty.csv", col_types = cols())

02D determine pairwise interaction

Combine outcomes of pairwise competition from CFU counts and CASEU

Raw data (e.g., CFU counts and CASEU Sanger sequences) are processed and generated into temporary result csv:

  1. 02B-CASEU_sanger_seq reads CASEU raw data and outputs temp/CASEU_pilot2.csv and temp/CASEU_pilot3.csv. Both files are CASEU predicted T8 frequencies.

  2. 02C-pairs_OD_CFU reads pair_competition and dilution factor data, and it outputs temp/pairs_CFU_freq_uncertainty.csv, which has the T0 OD-converted CFU frequencies and T8 CFU frequencies with uncertainties.

if (run_scripts) source("script/02D-determine_pairwise_interaction-01-combine_CFU_CASEU_result.R")

The script in this section returns a data.frame pairs_freq that has the following variables:

  • Community
  • Isolate1 and Isolate2: indices of isolates within a community. The number of isolate1 is always smaller than isolate2
  • Isolate1InitialODFreq and Isolate2InitialODFreq: 5, 50 or 95. The initial OD frequencies of isolates at T0. This two serve as discrete grouping variables.
  • Time: T0 or T8.
  • Isolate1MeasuredFreq: the measured frequency od isolate1 in the pair.
  • ErrorIsolate1MeasuredFreq: the uncertainty of Isolate1MeasuredFreq.
  • RawDataType: OD, ODtoCFU, CFU, Sanger (CASEU). The raw data type in which the isolate frequencies were measured.
  • Contamination: logical. There are contamination in three pairs at T8 plates.

186x3x2=1116 pair-freq at two time points for the self-assembled community networks.

read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_freq.csv", col_types = cols())

Determine pairwise interactions

if (run_scripts) source("script/02D-determine_pairwise_interaction-02-determine_pairwise_interaction.R")

Table of all 27 + 4 possible combinations of fitness function and their interaction types

read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_interaction_table.csv", col_types = cols())

Table of interaction types

pairs_interaction_fitness <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_interaction_fitness.csv", col_types = cols())
pairs_interaction_fitness %>%
  group_by(Set, InteractionType) %>%
  summarize(Count = n(), .groups = "keep") 

Isolate tournament

if (run_scripts) source("script/02D-determine_pairwise_interaction-03-isolate_tournament.R")

Tournament ranks of each isolate. Note that I consider neturality and bistability as draw in the tournament.

  • Score: the competitive scores of isolate. This score is computed by the formula: number of wins - number of loses + 0 * number of draws.
  • Game: number of pairwise competition the isolate has played. The number of games an isolate plated within a community should be community size minus one.
  • Rank: the ranks based on Score. The ranks range from 1 to the focal community size. Isolates with the same scores in a community are given the same rank.
  • PlotRank: continuous rank for plotting convenience.

02E competition phylogeny

Isolates’ RDP taxonomy

if (run_scripts) source("script/02E-competition_phylogeny-01-pairs_taxonomy.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_taxonomy.csv", col_types = cols())

Isolates’ 16S sequence difference

if (run_scripts) source("script/02E-competition_phylogeny-02-pairs_16S.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_16S.csv", col_types = cols())

Summary

Read and combine pairs data

if (run_scripts) source("script/02-pairs-01-read_data.R")

186 pairs of self-assembled communities and 112 pair from across-community and random networks

Pairs with three initial frequencies and three. 186x3x2=1116

pairs_meta contains the growth traits of isolates. Note that the order of Isolate1 and Isolate2 in some pairs are flipped such that Isolate1 is always the dominant strain and Isolate 2 is the subdomaint one. Dominant strain is the winner in exclusion pairs, and the more abundant strain (50:50 treatment) in coexistence pairs.

if (FALSE) pairs_meta <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_meta.csv", col_types = cols())

Example outcome types

if (run_scripts) source("script/02-pairs-02-outcome_types.R")

Coarse-grained

Fine-grained

read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_example_outcomes_finer.csv", col_types = cols())
---
title: "Analysis on pairwise competition assays"
author: "Chang-Yu Chang"
date: "`r Sys.Date()`"
output:
  html_notebook:
    toc: yes
    number_sections: no
---

```{r setup, include = F}
knitr::opts_chunk$set(cache = FALSE, echo = TRUE)
library(tidyverse)
library(cowplot)
source(here::here("plotting_scripts/misc.R"))
run_scripts <- F
```


# 02A pairs experiments

- Experiment-related scripts

- Read pairwise competition outcomes that are determined by colony counts on plates. 

- Read CASEU results

- Map pairs and isolates to the 96-well plate layout


## Read the pairwise competitive outcomes determined by colony counting on TSA

The script below generates the table for manual keyin

```{r}
if (FALSE) source("script/02A-pairs_experiment-01-pairwise_manual_key_in.R")
```

Then I manually checked the scanned plate images of competitive pairs. The plate images are saved in folder `data/raw/plate_scan/emergent_coexistence_plate_scan/` divided according to the experimental batches.

Batch | Community
------|-----------
B2    | 2.6, 2.8, 7.1, 8.4, 10.2, 11.1
C     | 11.1 isolate 1
C2    | 11.2
D     | 1.2, 1.4, 1.6, 1.7, 4.1, 11.5


``` {r}
# Read result colony counts and dilution factor
if (run_scripts) source("script/02A-pairs_experiment-02-colony_count.R")
```

There are 186x3=558 outcomes of pairwise competitions.

```{r}
pairs_competition <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_competition.csv", col_types = cols())
pairs_competition
```

67 of of 558 pair-frequency without determined outcomes of pairwise competitions will be determined by using CASEU method

```{r}
pairs_competition %>% filter(is.na(ColonyCount))
```

186 pair IDs saved in `data/temp/pairs_ID.csv`

```{r}
pairs_ID <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ID.csv", col_types = cols())
pairs_ID
```



## Ambiguous pairs and isolates on TSA agar plates

```{r}
if (run_scripts) source("script/02A-pairs_experiment-03-pairwise_ambiguous.R")
```


There are 67 pair-freqs that the competition outcome cannot be determined by TSA plate counting because of ambiguous morphology. The ambiguous pairs will be later examined by using selective media or Sanger sequencing. 

```{r}
pairs_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ambiguous.csv", col_types = cols())
pairs_ambiguous
```

In total, 28 pairs

```{r}
pairs_ambiguous %>% 
  group_by(Community, Isolate1, Isolate2) %>%
  summarize(Count = n(), .groups = "keep")
```

36 isolates involved in the ambiguous pairs. 

```{r}
isolates_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_ambiguous.csv", col_types = cols())
isolates_ambiguous
```



## Map pairs and isolates to the DW96 plate layout

```{r}
if (run_scripts) source("script/02A-pairs_experiment-04-pairwise_plate_layout.R")
```

Each plate has 96 rows and has the following variables

```{r}
plates <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates.csv", col_types = cols())
plates
```


Plates for across-community and random assembly

```{r}
plates_random <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates_random.csv", col_types = cols())
plates_random
```



```{r fig.height = 4, fig.width = 3}
plates_list <- plates %>%
    filter(Plate == "P1") %>%
    distinct(Batch, PlateLayout) %>%
    rename(batch = Batch, platelayout = PlateLayout) %>%
    rowwise() %>%
    mutate(plate = filter(plates, Batch == batch, PlateLayout == platelayout, Plate == "P1") %>% list) %>%
    mutate(plate = prepare_plate_draw(plate) %>% list) %>%
    mutate(p_plate = draw_plate_from_df(plate) %>% list) 

plot_grid(plotlist = plates_list$p_plate, labels = paste0(plates_list$batch, " ", plates_list$platelayout),
          ncol = 2)

```

Plate layout that takes different initial frequencies:

- P1 is 50%:50%.

- P2 and P3 are identical and whose rows are 95% and columns are 5%.

The only exception is plate C11R1 in batch C, which only has one plate.

```{r}
OD <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/OD.csv", col_types = cols())

OD %>%
    filter(Community == "C1R2", Wavelength == 620) %>%
    ggplot(aes(x = Transfer, y = Abs, color = Isolate1Freq)) +
    geom_line() + geom_point() +
    facet_grid(Isolate1 ~ Isolate2) +
    theme_classic()
    
```



# 02B CASEU sanger sequencing

- To determine the relative abundance of of ambiguous pairs in competitive assays. 

- Use CASEU packages to analyse the Sanger sequencing of mixture culture.

- Sanger sequencing protocol preparation. Experimental protocol files are saved in folder `output/protocol/`

```{r}
list.files(here::here("output/protocol"), pattern = "caseu")
```

- Once we got the mixture Sanger sequences back, we implement analysis by using `CASEU` (Compositional analysis by Sanger electropherogram unmixing), an R package designed for quantifying relative abundance of Sanger sequences mixture. [source code](https://bitbucket.org/DattaManoshi/caseu). Install `CASEU` from bitbucket.

- Check out [package vignette](https://htmlpreview.github.io/?https://bitbucket.org/dattamanoshi/caseu/raw/master/doc/CASEU_Vignette.html).

```{r, echo = TRUE, eval = FALSE}
devtools::install_bitbucket('dattamanoshi/caseu') # Install CASEU package
library(CASEU)
```


## Test on example data

```{r}
if (run_scripts) source("script/02B-CASEU_sanger_seq-00-test.R")
```


## CASEU pilot1

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20190813_Sanger_seq_prep.pdf`. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot1/protocol_20190813_Sanger_seq_prep-genewiz_table.csv", col_types = cols())
```

The isolates used in this round is from the list below.

Isolate         |  Code     |  Taxa
----------------|-----------|-------
C11R1 isolate 1 | A         | Pseudomonas
C1R7 isolate 2  | B         | Pseudomonas
C1R7 isolate 1  | C         | Enterobacter
C1R7 isolate 7  | D         | Raoultella
Table: isolates

Read trace matrices for isolates and mixtures

```{r}
if (run_scripts) source("script/02B-CASEU_sanger_seq-01-pilot1.R")
```


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot1.csv", col_types = cols())
```


## CASEU pilot2

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20190813_Sanger_seq_prep.pdf`. There are 32 samples from pairwise competition and 16 samples from control. Also read Sylvie's data. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/protocol_20190910_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
```

Isolates A B C D in control are the isolates below. 

Isolate         |  Code     |  Taxa
----------------|-----------|-------
C11R1 isolate 1 | A         | Pseudomonas
C1R7 isolate 2  | B         | Pseudomonas
C1R7 isolate 1  | C         | Enterobacter
C1R7 isolate 7  | D         | Raoultella

```{r}
if (run_scripts) source("script/02B-CASEU_sanger_seq-02-pilot2.R")
#source("script/02B-CASEU_sanger_seq-02a-pilot2_Sylvie.R") # Run Sylvie's sequence
```


In the 12 control synthetic pairs that were made of 4 isolates, compare these pairs' OD frequencies, colony counts, and CASEU predictions.

Read CASEU predicted frequencies and colony count frequencies in the 12 control pairs.

```{r}
CASEU_pilot2 <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot2.csv", col_types = cols())
CASEU_pilot2_plating <- read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/CASEU_pilot2_plating.csv", col_types = cols()) %>%
  mutate(Isolate1ColonyFreq = Isolate1Colony / (Isolate1Colony + Isolate2Colony),
    Isolate2ColonyFreq = Isolate2Colony / (Isolate1Colony + Isolate2Colony))
CASEU_pilot2
```



## CASEU pilot3

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20190923_SequalPrep_Sanger_prep.pdf`. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot3/protocol_20190924_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
```

Read trace matrices for isolates and mixtures

```{r}
# It takes about ~15 mins to run 
if (run_scripts) source("script/02B-CASEU_sanger_seq-03-pilot3.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot3.csv", col_types = cols())
```


## CASEU pilot4

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20191007_Sanger_seq_prep.pdf`. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot4/protocol_20191007_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
```

Read trace matrices for isolates and mixtures

```{r}
# It takes about ~15 mins to run
if (run_scripts) source("script/02B-CASEU_sanger_seq-04-pilot4.R")
```


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot4.csv", col_types = cols())
```


## CASEU six plates

- Genewiz submission code: CASEU_RN_5 (which is not correct because these are not RN plates)
- Plates:
    - B2 T7 933 P2
    - B2 T7 444 P2
    - C2 T7 13A P2
    - C2 T7 13B P2
    - D T7 75 P2
    - D T7 5543 P2
- Notes: the plates must be placed in order!!

```{r}
# Generate sequencing csv
if (FALSE) {
genewiz_table <- tibble(Sample = 1:(96*6), 
       `DNA Name` = paste0(rep(c("B2_T7_933_P2", "B2_T7_444_P2", "C2_T7_13A_P2", "C2_T7_13B_P2",  "D_T7_75_P2", "D_T7_5543_P2"), each = 96), "_", rep(1:96, 6)), 
       `Length (bp)` = 1000, `Concentration (ng/uL)` = 0.83, Primer = "27F")

write_csv(genewiz_table, here::here("output/protocol/tab_fig/20211202_Sanger_seq_prep-genewiz_table_CYC.csv"))

}
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_sixplates.csv", col_types = cols())
```



## Random network, CASEU batch 1

- Genewiz submisstion code: Caseu_RN_1
- Plate layout: T3 C P2
- Description: samples 3, 4, 5, 12, 13, 27, 29, 33, 73, 85, 86, 91, and 94 (order by column in the plate layout) arrived Genewiz dry, so they are not processed.

```{r}
# If will take ~2 hr to run
if (run_scripts) source("script/02B-random_network_CASEU-01-batch1.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN1.csv", col_types = cols())
```



## Random network, CASEU batch 2

- Genewiz submission code: Caseu_RN_2
- Plate layout: T3 C P2
- Description: this plate layout contains the isolates for all 4 random assembled communities 

```{r}
# It takes ~30 mins
if (run_scripts) source("script/02B-random_network_CASEU-02-batch2.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN2.csv", col_types = cols())
```


## Random network, CASEU batch 3

- Genewiz submisstion code: Caseu_RN_3
- Plate layout: 
    - P1 (T0 C P2), tracking number 30-333649143
    - P2 (T0 AD P2), tracking number 30-333649365
    - P3 (T3 AD P2), tracking number 30-333649517
- Description: AD plates have the isolates assembled from the self-assembled communities but different communities

```{r}
# It takes ~1.5 hr
if (run_scripts) source("script/02B-random_network_CASEU-03-batch3.R")
```

Use T0 OD frequencies

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN3.csv", col_types = cols())
```



## Random network, CASEU batch 4

- Genewiz submisstion code: Caseu_RN_4
- Plate layout: T3 BD P3
- Description: This plate has the full AcrAss2 (B) and half of RanAss2 (D) network.


```{r}
if (run_scripts) source("script/02B-random_network_CASEU-04-batch4.R")
```


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN4.csv", col_types = cols())
```




# 02C pairs_OD_CFU

- Error propagation theory

- Derive isolates' $\epsilon$ from T8 data and calculate uncertatinty using error propagation theory

- Convert T0 OD frequency $f^O$ to CFU frequency $f^C$ and estimate the uncertainty in converted CFU frequencies at T8. Output `data/temp/pairs_CFU_freq_uncertainty.csv`

## General formula of error propagation

This section explains the error propagation theory to estimate the uncertainty in the experimental measurement. There are one or more quantities $x, y, ...$, with corresponding uncertainties $\delta x, \delta y, ...$ and that we wish to use the measured values of x and y to calculate the quantity of real interest q. There are three provisional rules:

1. The square-root rule for counting experiments. The average number of event in time T is $v\pm\sqrt{v}$

2. Uncertainty in **sums and differences**. The computed mean value $q=x+y+z-(u+w)$. If the uncertainties in x, ..., w are known to be independent and random, then the uncertainty in the computed value of q is the quadratic sum $\delta q = \sqrt{(\delta x)^2+(\delta y)^2+(\delta z)^2+(\delta u)^2+(\delta w)^2}$ , In any case, $\delta q$ is never larger than their ordinary sum $\delta q \leqslant \delta x + \delta y+ \delta z + \delta u + \delta w$

3. Uncertainty in **product and quotients**. $q=\frac{x\times z}{u\times w}$, then the fractional uncertainty in the computed value q is the sum. If the uncertainties in x, ...,w are independent and random, then the fractional uncertainty in q is the sum in quadrature of the original uncertainties, $\frac{\delta q}{\left|q\right|} = \sqrt{(\frac{\delta x}{\left|x\right|})^2 + \frac{\delta y}{\left|y\right|})^2 +\frac{\delta z}{\left|z\right|})^2 +\frac{\delta u}{\left|u\right|})^2 + \frac{\delta w}{\left|w\right|})^2}$. $\frac{\delta q}{\left| q \right|} \leqslant \frac{\delta x}{\left| x \right|} + \frac{\delta z}{\left| z \right|} + \frac{\delta u}{\left| u \right|} + \frac{\delta w}{\left| w \right|}$ 

Taking these three provisional rules together, the general formula for error propagation takes the following form. Assume the computed quantity is a function of $x_1, x_2, ..., x_n$, the uncertainty in q is $$\delta q= \sqrt{(\sum{\frac{\partial q}{\partial x_i}}\delta x_i)^2}$$ 



## Uncertainty in epsilon 

```{r}
if (run_scripts) source("script/02C-pairs_OD_CFU-01-epsilon_uncertainty.R")
```


The uncertainties in each isolate' epsilon $\epsilon_A = \frac{OD_A DF_A v_A}{CFU_A}$ comes from four parts:

1. $DF$: Dilution factors. The uncertainty comes from the pipetting in serial dilution, which is calcaulted below.
2. $v$: Plating volumes. The systematic error for P20 set at 20 uL is 0.4 uL.
3. $CFU$: CFU counts. The uncertainty in CFU follows poisson distribution, which means that the variance is the same as the mean (measured CFU). The standard deviation is $\sqrt{CFU}$.
4. $OD$: OD measurement. The uncertainty in measuring OD in plate reader, which is assumed to be 0.001.


**Dilution factor**

The uncertainty in dilution factor comes from the pipetting steps in serial dilution, which include two pipetting volumes:

1. `V1`: serial dispensing 10 uL of diluted solution using mP20. It has uncertainty `ErrorV1` 2 uL.
2. `V2`: dispense 90 uL of PBS using mP200. It has uncertainty `ErrorV2` 0.4 uL.

For dilution factor $10^{n}$, it has the the mean $(\frac{V_1}{V_1+V_2})^n$. To obtain the uncertainty in the measured mean, first we caluculate the partial derivatives $\frac{\partial DF}{\partial V_1}$ and $\frac{\partial DF}{\partial V_2}$. Then the uncertainty $\delta DF = \sqrt{(\frac{\partial DF}{\partial V_1}\delta V_1)^2 + (\frac{\partial DF}{\partial V_2}\delta V_2)^2}$


```{r}
dilution_factor <- tibble(n = c(4, 5), V1 = 10, V2 = 90, ErrorV1 = 0.4, ErrorV2 = 2) %>%
  mutate(PartialV1 = n * V1^(n-1) * V2 / (V1+V2)^(n+1), #-n*(V1+V2)^(n-1)*V2/(V1^(n+1)),
         PartialV2 = -n * V1^(n-1) / (V1+V2)^(n+1), #n*(V1+V2)^(n-1)/(V1^n),
         DF = (V1/(V1+V2))^n,
         ErrorDF = sqrt((PartialV1*ErrorV1)^2 + (PartialV2*ErrorV2)^2))

dilution_factor
```


By using error propagation theroy, the uncertainty in isolates epsilon has the form 

$$\delta \epsilon = \sqrt{(\frac{\partial  \epsilon}{\partial DF}\delta DF)^2 +(\frac{\partial  \epsilon}{\partial CFU}\delta CFU)^2 + (\frac{\partial  \epsilon}{\partial OD}\delta OD)^2 + (\frac{\partial  \epsilon}{\partial V}\delta V)^2}$$


```{r}
isolates_epsilon_uncertainty <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_epsilon_uncertainty.csv", col_types = cols())
isolates_epsilon_uncertainty
```

There are 7 isolates that have either 0 CFU or negative OD value, so they don't have epsilon values.

```{r}
isolates_epsilon_uncertainty %>%
  filter(is.na(Epsilon))
```

## Convert T0 OD frequencies to CFU frequencies

```{r}
if (run_scripts) source("script/02C-pairs_OD_CFU-02-CFU_frequency.R")
```

The outcome of pairwise competition were determined by comparing the frequency changes between T0 and T8. The T8 frequencies were determined by plating the mature media on rich agar media on which the colonies were counted, whereas T0 frequencies were set to 95/5, 50/50 and 5/95 by mixing two isolate inocula with equal OD.  In this section, I will use the OD-CFU conversion coefficient $\epsilon$ derived from T8 isolate data to convert T0 OD frequencies into CFU frequencies. In specific, the CFU frequency of isolate 1 $f^C_1$ of a pair can be derived from OD frequencies of isolate 1 and 2 $f^O_1 ,f^O_2$, which have the relationship below. 

$f^C_1 = \frac{f^o_1\epsilon_1DFv}{f^o_1\epsilon_1DFv + f^o_2\epsilon_2DFv}=\frac{f^o_1\epsilon_1}{f^o_1\epsilon_1 + f^o_2\epsilon_2}$

where $\epsilon$ of each isolate was pre-calculated from T8 dataset. DF and v are the same in conversion.

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq.csv", col_types = cols())
```


## Uncertainty in T0 CFU frequencies

```{r}
if (run_scripts) source("script/02C-pairs_OD_CFU-03-CFU_frequency_uncertainty.R")
```

Write CFU frequencies to `data/temp/pairs_CFU_freq_uncertainty.csv`

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq_uncertainty.csv", col_types = cols())
```


# 02D determine pairwise interaction

## Combine outcomes of pairwise competition from CFU counts and CASEU

Raw data (e.g., CFU counts and CASEU Sanger sequences) are processed and generated into temporary result csv:

1. `02B-CASEU_sanger_seq` reads CASEU raw data and outputs `temp/CASEU_pilot2.csv` and `temp/CASEU_pilot3.csv`. Both files are CASEU predicted T8 frequencies.

2. `02C-pairs_OD_CFU` reads pair_competition and dilution factor data, and it outputs `temp/pairs_CFU_freq_uncertainty.csv`, which has the T0 OD-converted CFU frequencies and T8 CFU frequencies with uncertainties.


```{r}
if (run_scripts) source("script/02D-determine_pairwise_interaction-01-combine_CFU_CASEU_result.R")
```

The script in this section returns a data.frame `pairs_freq` that has the following variables:

- `Community`
- `Isolate1` and `Isolate2`: indices of isolates within a community. The number of isolate1 is always smaller than isolate2
- `Isolate1InitialODFreq` and `Isolate2InitialODFreq`: 5, 50 or 95. The initial OD frequencies of isolates at T0. This two serve as discrete grouping variables.
- `Time`: T0 or T8.
- `Isolate1MeasuredFreq`: the measured frequency od isolate1 in the pair. 
- `ErrorIsolate1MeasuredFreq`: the uncertainty of `Isolate1MeasuredFreq`.
- `RawDataType`: OD, ODtoCFU, CFU, Sanger (CASEU). The raw data type in which the isolate frequencies were measured.
- `Contamination`: logical. There are contamination in three pairs at T8 plates.

186x3x2=1116 pair-freq at two time points for the self-assembled community networks.


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_freq.csv", col_types = cols())
```




## Determine pairwise interactions

```{r}
if (run_scripts) source("script/02D-determine_pairwise_interaction-02-determine_pairwise_interaction.R")
```

Table of all 27 + 4 possible combinations of fitness function and their interaction types 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_interaction_table.csv", col_types = cols())
```


Table of interaction types

```{r}
pairs_interaction_fitness <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_interaction_fitness.csv", col_types = cols())
pairs_interaction_fitness %>%
  group_by(Set, InteractionType) %>%
  summarize(Count = n(), .groups = "keep") 
```

## Isolate tournament


```{r}
if (run_scripts) source("script/02D-determine_pairwise_interaction-03-isolate_tournament.R")
```

Tournament ranks of each isolate. Note that I consider neturality and bistability as draw in the tournament.

- `Score`: the competitive scores of isolate. This score is computed by the formula: number of wins - number of loses + 0 * number of draws. 
- `Game`: number of pairwise competition the isolate has played. The number of games an isolate plated within a community should be community size minus one. 
- `Rank`: the ranks based on `Score`. The ranks range from 1 to the focal community size. Isolates with the same scores in a community are given the same rank.
- `PlotRank`: continuous rank for plotting convenience.


# 02E competition phylogeny

- Correlate competition result with phylogenetic distance

- Measure the pairwise phylogenetic distances by difference in 16S base pairs `pairs_taxonomy`
    - Coarse-grained taxonomy (Family and Genus)
    - Compute the pairwise sequence differences using `Biostring::pairwiseAlignment()` 
    - Compute pairwise tree distance using `ape::cophenetic.phylo()` on built tree. Try out different tree building methods


## Isolates' RDP taxonomy

```{r}
if (run_scripts) source("script/02E-competition_phylogeny-01-pairs_taxonomy.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_taxonomy.csv", col_types = cols())
```



## Isolates' 16S sequence difference


```{r}
if (run_scripts) source("script/02E-competition_phylogeny-02-pairs_16S.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_16S.csv", col_types = cols())
```




# Summary

## Read and combine pairs data 

```{r}
if (run_scripts) source("script/02-pairs-01-read_data.R")
```

186 pairs of self-assembled communities and 112 pair from across-community and random networks

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs.csv", col_types = cols())
```

Pairs with three initial frequencies and three. 186x3x2=1116

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_freq.csv", col_types = cols()) %>%
    filter(Set == "CFUandCASEU")
```

pairs_meta contains the growth traits of isolates. Note that the order of Isolate1 and Isolate2 in some pairs are flipped such that Isolate1 is always the dominant strain and Isolate 2 is the subdomaint one. Dominant strain is the winner in exclusion pairs, and the more abundant strain (50:50 treatment) in coexistence pairs.

```{r}
if (FALSE) pairs_meta <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_meta.csv", col_types = cols())
```

Example outcome types

```{r}
if (run_scripts) source("script/02-pairs-02-outcome_types.R")
```

Coarse-grained 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_example_outcomes.csv", col_types = cols())
```
Fine-grained 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_example_outcomes_finer.csv", col_types = cols())
```






